Self-Organizing Databases: Near-Optimal Query Performance at all Times Using Flexible Views
نویسنده
چکیده
Project Summary: The goal of this project is to develop new effective methods to improve the performance of sets of frequent and important queries on large relational databases at all times, which could improve the efficiency of user interactions with data-management systems. Solving the problem will have the most effect in query optimization, data warehousing, and information integration, which are important research topics with direct practical applications. Moreover, our research program offers a unique test case for a fundamental understanding of optimality in answering queries and, more generally, in database performance. The project focuses on the methodology of evaluating queries using views; views are relations that are defined by auxiliary queries and can be used to rewrite and answer user queries. One way to improve query performance is precompute and store (i.e., materialize) views. To truly optimize query performance, it is critical to materialize the "right" views. The current focus of the project is on demonstrating that, by designing and materializing views, it is possible to ensure optimal or near-optimal performance of frequent and important queries, for common and important query types. We consider this problem in the broader context of designing self-organizing databases: A self-organizing database periodically determines, without human intervention, a representative set of frequent and important queries on the data, and incrementally designs and precomputes the optimal (or near-optimal) views for that representative query workload. As the representative query workload and the stored data change over time, self-organizing databases adapt to the changes by changing the set of materialized views that are used to improve the query-answering performance in the database. This approach has a potential to lead to dramatic improvements in the efficiency of user interactions with many types of data-management systems. Solving the problem of building self-organizing databases will have the most effect in query optimization, data warehousing, and information integration. For building self-organizing databases, we consider an end-to-end solution – that is, we consider all aspects of handling and using views, including: • designing and materializing views and indexes to improve query performance; • exploring the effects of materialized views on the process of query optimization; • adapting view design to the changing query workload, including the process of retiring views that are no longer useful; • developing methods for auomatically updating existing materialized views over time, to reflect the changes in the stored data; • developing methods to collect database statistics to reliably estimate the sizes of the views the system considers for materialization; • analyzing the use of system resources and allocating an appropriate amount of resources to view management in the system. In our research in self-organizing databases, we adopt an approach that combines theoretical work (for some aspects of the project) and extensive implementation (C++) and experimentation based on an open-source database system called PostgreSQL, http://www.postgresql.org. The students involved in the project will get hands-on experience in conducting research in databases, will develop implementation skills in database systems, and will learn to set up, conduct, analyze, and report experiments related to database performance on large amounts of data. In the current stage of the project, we are setting up PostgreSQL, and are also concentrating on developing efficient and scalable heuristic algorithms that design (near-) optimal sets of views for the given queries. This project has two parts: (1) theoretical analysis and design of algorithms and heuristics for view design, and (2) implementation and experiments on large databases, to evaluate the performance improvements caused by using the views. Funding for research in self-organizing databases is likely in the near future; success in obtaining funding is contingent on the success of the ongoing stages of the project.
منابع مشابه
View-Size Estimation in Self-Organizing Databases
Data-intensive systems routinely use derived data, such as indexes or materialized views, to improve query-evaluation performance. In this context, the problem of dsigning derived data is as follows: Given a database and a set of queries, return definitions of derived data that, when precomputed and stored in the database, would reduce the evaluation costs of the queries. Designing materialized...
متن کاملView-based Recognition using SHOSLIF
We describe a self-organizing framework for content-based retrieval of images from large image databases at the object recognition level. The system uses the theories of optimal projection for optimal feature selection and a hierarchical image database for rapid retrieval rates. We demonstrate the query technique on a large database of widely varying real-world objects in natural settings, and ...
متن کاملCOMPARISON BETWEEN MINIMUM AND NEAR MINIMUM TIME OPTIMAL CONTROL OF A FLEXIBLE SLEWING SPACECRAFT
In this paper, a minimum and near-minimum time optimal control laws are developed and compared for a rigid space platform with flexible links during an orientating maneuver with large angle of rotation. The control commands are considered as typical bang-bang with multiple symmetrical switches, the time optimal control solution for the rigid-body mode is obtained as a bang-bang function and app...
متن کاملDesigning Views to Optimize Real Queries
This paper considers the following problem: given a query workload, a database, and a set of constraints, design a set of views that give equivalent rewritings of the workload queries and globally minimize the evaluation costs of the workload on the database under the constraints. We refer to this problem as “view design for query performance,” or “view design” for short; sets of views that sat...
متن کاملOptimizing Physical Design of Multidimensional Files for Join Queries
Optimally organizing multidimensional data is NP-hard. The little work that has been done in optimising multidimensional data was limited to uniform data distribution and rarely considered the probability of use of each query. And those who did consider the probability of use of each query, they were limited to either partial match query or range query. This work shows that by combining heurist...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003